Overview

Dataset statistics

Number of variables21
Number of observations5354259
Missing cells38229948
Missing cells (%)34.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory4.7 GiB
Average record size in memory947.8 B

Variable types

Numeric4
DateTime2
Text3
Categorical12

Alerts

PERSON_TYPE is highly imbalanced (86.1%)Imbalance
PERSON_INJURY is highly imbalanced (66.0%)Imbalance
EJECTION is highly imbalanced (93.4%)Imbalance
EMOTIONAL_STATUS is highly imbalanced (74.9%)Imbalance
BODILY_INJURY is highly imbalanced (69.3%)Imbalance
POSITION_IN_VEHICLE is highly imbalanced (52.9%)Imbalance
SAFETY_EQUIPMENT is highly imbalanced (60.4%)Imbalance
COMPLAINT is highly imbalanced (75.7%)Imbalance
VEHICLE_ID has 217425 (4.1%) missing valuesMissing
PERSON_AGE has 573413 (10.7%) missing valuesMissing
EJECTION has 2607425 (48.7%) missing valuesMissing
EMOTIONAL_STATUS has 2525502 (47.2%) missing valuesMissing
BODILY_INJURY has 2525459 (47.2%) missing valuesMissing
POSITION_IN_VEHICLE has 2607042 (48.7%) missing valuesMissing
SAFETY_EQUIPMENT has 2779911 (51.9%) missing valuesMissing
PED_LOCATION has 5267506 (98.4%) missing valuesMissing
PED_ACTION has 5267607 (98.4%) missing valuesMissing
COMPLAINT has 2525452 (47.2%) missing valuesMissing
PED_ROLE has 194889 (3.6%) missing valuesMissing
CONTRIBUTING_FACTOR_1 has 5268834 (98.4%) missing valuesMissing
CONTRIBUTING_FACTOR_2 has 5268945 (98.4%) missing valuesMissing
PERSON_SEX has 600519 (11.2%) missing valuesMissing
PERSON_AGE is highly skewed (γ1 = 71.62917362)Skewed
UNIQUE_ID has unique valuesUnique
PERSON_AGE has 547074 (10.2%) zerosZeros

Reproduction

Analysis started2024-05-07 03:32:16.744572
Analysis finished2024-05-07 03:35:20.071068
Duration3 minutes and 3.33 seconds
Software versionydata-profiling vv4.7.0
Download configurationconfig.json

Variables

UNIQUE_ID
Real number (ℝ)

UNIQUE 

Distinct5354259
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean9035720.7
Minimum10922
Maximum12968392
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size40.8 MiB
2024-05-06T23:35:20.155482image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum10922
5-th percentile5802509.9
Q16980392.5
median9333883
Q311383736
95-th percentile12674086
Maximum12968392
Range12957470
Interquartile range (IQR)4403343

Descriptive statistics

Standard deviation2618240.5
Coefficient of variation (CV)0.28976554
Kurtosis-0.07563761
Mean9035720.7
Median Absolute Deviation (MAD)2206706
Skewness-0.48457155
Sum4.8379589 × 1013
Variance6.8551831 × 1012
MonotonicityNot monotonic
2024-05-06T23:35:20.213499image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10249006 1
 
< 0.1%
9412294 1
 
< 0.1%
7428923 1
 
< 0.1%
2236482 1
 
< 0.1%
7207179 1
 
< 0.1%
9988837 1
 
< 0.1%
9465489 1
 
< 0.1%
6107203 1
 
< 0.1%
10102117 1
 
< 0.1%
7337158 1
 
< 0.1%
Other values (5354249) 5354249
> 99.9%
ValueCountFrequency (%)
10922 1
< 0.1%
79660 1
< 0.1%
79953 1
< 0.1%
79954 1
< 0.1%
81004 1
< 0.1%
81073 1
< 0.1%
81886 1
< 0.1%
82012 1
< 0.1%
82146 1
< 0.1%
82227 1
< 0.1%
ValueCountFrequency (%)
12968392 1
< 0.1%
12968391 1
< 0.1%
12968390 1
< 0.1%
12968353 1
< 0.1%
12968352 1
< 0.1%
12968345 1
< 0.1%
12968344 1
< 0.1%
12968343 1
< 0.1%
12968342 1
< 0.1%
12968341 1
< 0.1%

COLLISION_ID
Real number (ℝ)

Distinct1456093
Distinct (%)27.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3949022.2
Minimum37
Maximum4722272
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size40.8 MiB
2024-05-06T23:35:20.276670image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum37
5-th percentile3423239
Q13677963
median4002445
Q34339124
95-th percentile4645051
Maximum4722272
Range4722235
Interquartile range (IQR)661161

Descriptive statistics

Standard deviation651086.15
Coefficient of variation (CV)0.16487275
Kurtosis17.977044
Mean3949022.2
Median Absolute Deviation (MAD)329916
Skewness-3.5019178
Sum2.1144088 × 1013
Variance4.2391318 × 1011
MonotonicityNot monotonic
2024-05-06T23:35:20.328272image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
3963775 77
 
< 0.1%
4691158 71
 
< 0.1%
3591272 66
 
< 0.1%
3539636 65
 
< 0.1%
3504309 64
 
< 0.1%
3571716 62
 
< 0.1%
3904409 61
 
< 0.1%
3691734 61
 
< 0.1%
4143411 60
 
< 0.1%
3449201 60
 
< 0.1%
Other values (1456083) 5353612
> 99.9%
ValueCountFrequency (%)
37 1
< 0.1%
39 1
< 0.1%
40 1
< 0.1%
44 1
< 0.1%
52 1
< 0.1%
55 2
< 0.1%
78 1
< 0.1%
79 2
< 0.1%
104 1
< 0.1%
107 1
< 0.1%
ValueCountFrequency (%)
4722272 1
 
< 0.1%
4722270 7
< 0.1%
4722268 3
< 0.1%
4722265 4
< 0.1%
4722264 5
< 0.1%
4722263 3
< 0.1%
4722260 2
 
< 0.1%
4722259 3
< 0.1%
4722254 2
 
< 0.1%
4722253 2
 
< 0.1%
Distinct4325
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size40.8 MiB
Minimum2012-07-01 00:00:00
Maximum2024-05-03 00:00:00
2024-05-06T23:35:20.377841image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:35:20.434091image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct1440
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size40.8 MiB
Minimum2024-05-06 00:00:00
Maximum2024-05-06 23:59:00
2024-05-06T23:35:20.491387image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:35:20.543339image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
Distinct5159436
Distinct (%)96.4%
Missing19
Missing (%)< 0.1%
Memory size446.3 MiB
2024-05-06T23:35:22.479428image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length36
Median length36
Mean length30.4029
Min length1

Characters and Unicode

Total characters162784425
Distinct characters17
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique5159404 ?
Unique (%)96.4%

Sample

1st row31aa2bc0-f545-444f-8cdb-f1cb5cf00b89
2nd row4629e500-a73e-48dc-b8fb-53124d124b80
3rd rowae48c136-1383-45db-83f4-2a5eecfb7cff
4th row2782525
5th rowe038e18f-40fb-4471-99cf-345eae36e064
ValueCountFrequency (%)
1 142787
 
2.7%
2 31734
 
0.6%
3 11543
 
0.2%
4 4672
 
0.1%
5 2005
 
< 0.1%
6 923
 
< 0.1%
7 448
 
< 0.1%
8 235
 
< 0.1%
9 149
 
< 0.1%
10 91
 
< 0.1%
Other values (5159426) 5159653
96.4%
2024-05-06T23:35:24.274776image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
- 17478992
 
10.7%
4 13032417
 
8.0%
9 9762024
 
6.0%
8 9750714
 
6.0%
b 9284397
 
5.7%
a 9284146
 
5.7%
1 9080464
 
5.6%
2 8947375
 
5.5%
3 8733405
 
5.4%
7 8667294
 
5.3%
Other values (7) 58763197
36.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 162784425
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
- 17478992
 
10.7%
4 13032417
 
8.0%
9 9762024
 
6.0%
8 9750714
 
6.0%
b 9284397
 
5.7%
a 9284146
 
5.7%
1 9080464
 
5.6%
2 8947375
 
5.5%
3 8733405
 
5.4%
7 8667294
 
5.3%
Other values (7) 58763197
36.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 162784425
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
- 17478992
 
10.7%
4 13032417
 
8.0%
9 9762024
 
6.0%
8 9750714
 
6.0%
b 9284397
 
5.7%
a 9284146
 
5.7%
1 9080464
 
5.6%
2 8947375
 
5.5%
3 8733405
 
5.4%
7 8667294
 
5.3%
Other values (7) 58763197
36.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 162784425
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
- 17478992
 
10.7%
4 13032417
 
8.0%
9 9762024
 
6.0%
8 9750714
 
6.0%
b 9284397
 
5.7%
a 9284146
 
5.7%
1 9080464
 
5.6%
2 8947375
 
5.5%
3 8733405
 
5.4%
7 8667294
 
5.3%
Other values (7) 58763197
36.1%

PERSON_TYPE
Categorical

IMBALANCE 

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size332.3 MiB
Occupant
5150455 
Pedestrian
 
126915
Bicyclist
 
67688
Other Motorized
 
9201

Length

Max length15
Median length8
Mean length8.0720781
Min length8

Characters and Unicode

Total characters43219997
Distinct characters21
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowOccupant
2nd rowOccupant
3rd rowOccupant
4th rowOccupant
5th rowOccupant

Common Values

ValueCountFrequency (%)
Occupant 5150455
96.2%
Pedestrian 126915
 
2.4%
Bicyclist 67688
 
1.3%
Other Motorized 9201
 
0.2%

Length

2024-05-06T23:35:24.335933image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:35:24.381296image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
occupant 5150455
96.0%
pedestrian 126915
 
2.4%
bicyclist 67688
 
1.3%
other 9201
 
0.2%
motorized 9201
 
0.2%

Most occurring characters

ValueCountFrequency (%)
c 10436286
24.1%
t 5363460
12.4%
a 5277370
12.2%
n 5277370
12.2%
O 5159656
11.9%
u 5150455
11.9%
p 5150455
11.9%
e 272232
 
0.6%
i 271492
 
0.6%
s 194603
 
0.5%
Other values (11) 666618
 
1.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 43219997
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
c 10436286
24.1%
t 5363460
12.4%
a 5277370
12.2%
n 5277370
12.2%
O 5159656
11.9%
u 5150455
11.9%
p 5150455
11.9%
e 272232
 
0.6%
i 271492
 
0.6%
s 194603
 
0.5%
Other values (11) 666618
 
1.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 43219997
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
c 10436286
24.1%
t 5363460
12.4%
a 5277370
12.2%
n 5277370
12.2%
O 5159656
11.9%
u 5150455
11.9%
p 5150455
11.9%
e 272232
 
0.6%
i 271492
 
0.6%
s 194603
 
0.5%
Other values (11) 666618
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 43219997
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
c 10436286
24.1%
t 5363460
12.4%
a 5277370
12.2%
n 5277370
12.2%
O 5159656
11.9%
u 5150455
11.9%
p 5150455
11.9%
e 272232
 
0.6%
i 271492
 
0.6%
s 194603
 
0.5%
Other values (11) 666618
 
1.5%

PERSON_INJURY
Categorical

IMBALANCE 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size344.7 MiB
Unspecified
4702746 
Injured
648386 
Killed
 
3127

Length

Max length11
Median length11
Mean length10.512691
Min length6

Characters and Unicode

Total characters56287670
Distinct characters15
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified

Common Values

ValueCountFrequency (%)
Unspecified 4702746
87.8%
Injured 648386
 
12.1%
Killed 3127
 
0.1%

Length

2024-05-06T23:35:24.423605image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:35:24.460430image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
unspecified 4702746
87.8%
injured 648386
 
12.1%
killed 3127
 
0.1%

Most occurring characters

ValueCountFrequency (%)
e 10057005
17.9%
i 9408619
16.7%
d 5354259
9.5%
n 5351132
9.5%
U 4702746
8.4%
s 4702746
8.4%
p 4702746
8.4%
c 4702746
8.4%
f 4702746
8.4%
I 648386
 
1.2%
Other values (5) 1954539
 
3.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 56287670
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 10057005
17.9%
i 9408619
16.7%
d 5354259
9.5%
n 5351132
9.5%
U 4702746
8.4%
s 4702746
8.4%
p 4702746
8.4%
c 4702746
8.4%
f 4702746
8.4%
I 648386
 
1.2%
Other values (5) 1954539
 
3.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 56287670
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 10057005
17.9%
i 9408619
16.7%
d 5354259
9.5%
n 5351132
9.5%
U 4702746
8.4%
s 4702746
8.4%
p 4702746
8.4%
c 4702746
8.4%
f 4702746
8.4%
I 648386
 
1.2%
Other values (5) 1954539
 
3.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 56287670
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 10057005
17.9%
i 9408619
16.7%
d 5354259
9.5%
n 5351132
9.5%
U 4702746
8.4%
s 4702746
8.4%
p 4702746
8.4%
c 4702746
8.4%
f 4702746
8.4%
I 648386
 
1.2%
Other values (5) 1954539
 
3.5%

VEHICLE_ID
Real number (ℝ)

MISSING 

Distinct2476590
Distinct (%)48.2%
Missing217425
Missing (%)4.1%
Infinite0
Infinite (%)0.0%
Mean18521806
Minimum123423
Maximum20645072
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size40.8 MiB
2024-05-06T23:35:24.522732image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum123423
5-th percentile17033806
Q117545317
median18692171
Q319728357
95-th percentile20475924
Maximum20645072
Range20521649
Interquartile range (IQR)2183039.8

Descriptive statistics

Standard deviation1560660.3
Coefficient of variation (CV)0.084260696
Kurtosis8.0441376
Mean18521806
Median Absolute Deviation (MAD)1099290
Skewness-1.8952034
Sum9.5143445 × 1013
Variance2.4356606 × 1012
MonotonicityNot monotonic
2024-05-06T23:35:24.574446image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18590796 71
 
< 0.1%
17075216 63
 
< 0.1%
17334601 63
 
< 0.1%
17364088 60
 
< 0.1%
18954743 58
 
< 0.1%
18968693 58
 
< 0.1%
17483298 58
 
< 0.1%
17826063 58
 
< 0.1%
19106096 57
 
< 0.1%
17521817 57
 
< 0.1%
Other values (2476580) 5136231
95.9%
(Missing) 217425
 
4.1%
ValueCountFrequency (%)
123423 1
 
< 0.1%
602947 2
< 0.1%
611686 1
 
< 0.1%
620307 1
 
< 0.1%
621082 2
< 0.1%
622848 3
< 0.1%
625915 1
 
< 0.1%
628019 1
 
< 0.1%
629935 1
 
< 0.1%
630993 3
< 0.1%
ValueCountFrequency (%)
20645072 1
 
< 0.1%
20645071 2
< 0.1%
20645048 1
 
< 0.1%
20645047 1
 
< 0.1%
20645040 2
< 0.1%
20645039 3
< 0.1%
20645038 1
 
< 0.1%
20645037 2
< 0.1%
20645036 2
< 0.1%
20645035 1
 
< 0.1%

PERSON_AGE
Real number (ℝ)

MISSING  SKEWED  ZEROS 

Distinct886
Distinct (%)< 0.1%
Missing573413
Missing (%)10.7%
Infinite0
Infinite (%)0.0%
Mean37.250145
Minimum-999
Maximum9999
Zeros547074
Zeros (%)10.2%
Negative1167
Negative (%)< 0.1%
Memory size40.8 MiB
2024-05-06T23:35:24.625585image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

Minimum-999
5-th percentile0
Q124
median35
Q350
95-th percentile68
Maximum9999
Range10998
Interquartile range (IQR)26

Descriptive statistics

Standard deviation114.32082
Coefficient of variation (CV)3.0690033
Kurtosis5755.5251
Mean37.250145
Median Absolute Deviation (MAD)13
Skewness71.629174
Sum1.7808721 × 108
Variance13069.25
MonotonicityNot monotonic
2024-05-06T23:35:24.677432image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 547074
 
10.2%
30 108808
 
2.0%
29 108607
 
2.0%
28 108204
 
2.0%
27 107798
 
2.0%
31 104954
 
2.0%
26 104646
 
2.0%
32 103558
 
1.9%
25 100621
 
1.9%
33 100488
 
1.9%
Other values (876) 3286088
61.4%
(Missing) 573413
 
10.7%
ValueCountFrequency (%)
-999 8
< 0.1%
-997 2
 
< 0.1%
-996 1
 
< 0.1%
-992 2
 
< 0.1%
-991 1
 
< 0.1%
-990 3
 
< 0.1%
-989 1
 
< 0.1%
-987 1
 
< 0.1%
-982 3
 
< 0.1%
-980 1
 
< 0.1%
ValueCountFrequency (%)
9999 415
< 0.1%
9262 1
 
< 0.1%
9232 1
 
< 0.1%
9211 1
 
< 0.1%
9191 1
 
< 0.1%
9151 1
 
< 0.1%
9122 1
 
< 0.1%
8041 1
 
< 0.1%
7301 2
 
< 0.1%
7275 2
 
< 0.1%

EJECTION
Categorical

IMBALANCE  MISSING 

Distinct6
Distinct (%)< 0.1%
Missing2607425
Missing (%)48.7%
Memory size337.3 MiB
Not Ejected
2693892 
Ejected
 
24499
Does Not Apply
 
15891
Partially Ejected
 
10739
Trapped
 
1272

Length

Max length17
Median length11
Mean length11.002497
Min length7

Characters and Unicode

Total characters30222033
Distinct characters24
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNot Ejected
2nd rowNot Ejected
3rd rowNot Ejected
4th rowNot Ejected
5th rowNot Ejected

Common Values

ValueCountFrequency (%)
Not Ejected 2693892
50.3%
Ejected 24499
 
0.5%
Does Not Apply 15891
 
0.3%
Partially Ejected 10739
 
0.2%
Trapped 1272
 
< 0.1%
Unknown 541
 
< 0.1%
(Missing) 2607425
48.7%

Length

2024-05-06T23:35:24.724368image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:35:24.764101image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
ejected 2729130
49.8%
not 2709783
49.4%
does 15891
 
0.3%
apply 15891
 
0.3%
partially 10739
 
0.2%
trapped 1272
 
< 0.1%
unknown 541
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 5475423
18.1%
t 5449652
18.0%
2736413
9.1%
d 2730402
9.0%
E 2729130
9.0%
j 2729130
9.0%
c 2729130
9.0%
o 2726215
9.0%
N 2709783
9.0%
l 37369
 
0.1%
Other values (14) 169386
 
0.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 30222033
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 5475423
18.1%
t 5449652
18.0%
2736413
9.1%
d 2730402
9.0%
E 2729130
9.0%
j 2729130
9.0%
c 2729130
9.0%
o 2726215
9.0%
N 2709783
9.0%
l 37369
 
0.1%
Other values (14) 169386
 
0.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 30222033
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 5475423
18.1%
t 5449652
18.0%
2736413
9.1%
d 2730402
9.0%
E 2729130
9.0%
j 2729130
9.0%
c 2729130
9.0%
o 2726215
9.0%
N 2709783
9.0%
l 37369
 
0.1%
Other values (14) 169386
 
0.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 30222033
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 5475423
18.1%
t 5449652
18.0%
2736413
9.1%
d 2730402
9.0%
E 2729130
9.0%
j 2729130
9.0%
c 2729130
9.0%
o 2726215
9.0%
N 2709783
9.0%
l 37369
 
0.1%
Other values (14) 169386
 
0.6%

EMOTIONAL_STATUS
Categorical

IMBALANCE  MISSING 

Distinct8
Distinct (%)< 0.1%
Missing2525502
Missing (%)47.2%
Memory size343.3 MiB
Does Not Apply
2340767 
Conscious
452451 
Unknown
 
13540
Shock
 
13108
Semiconscious
 
2724
Other values (3)
 
6167

Length

Max length14
Median length14
Mean length13.11888
Min length5

Characters and Unicode

Total characters37110123
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDoes Not Apply
2nd rowDoes Not Apply
3rd rowConscious
4th rowConscious
5th rowDoes Not Apply

Common Values

ValueCountFrequency (%)
Does Not Apply 2340767
43.7%
Conscious 452451
 
8.5%
Unknown 13540
 
0.3%
Shock 13108
 
0.2%
Semiconscious 2724
 
0.1%
Unconscious 2536
 
< 0.1%
Apparent Death 1847
 
< 0.1%
Incoherent 1784
 
< 0.1%
(Missing) 2525502
47.2%

Length

2024-05-06T23:35:24.809263image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:35:24.851512image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
does 2340767
31.2%
not 2340767
31.2%
apply 2340767
31.2%
conscious 452451
 
6.0%
unknown 13540
 
0.2%
shock 13108
 
0.2%
semiconscious 2724
 
< 0.1%
unconscious 2536
 
< 0.1%
apparent 1847
 
< 0.1%
death 1847
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
o 5625388
15.2%
p 4685228
12.6%
4683381
12.6%
s 3256189
8.8%
e 2350753
6.3%
t 2346245
6.3%
D 2342614
6.3%
A 2342614
6.3%
N 2340767
6.3%
l 2340767
6.3%
Other values (15) 4796177
12.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 37110123
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 5625388
15.2%
p 4685228
12.6%
4683381
12.6%
s 3256189
8.8%
e 2350753
6.3%
t 2346245
6.3%
D 2342614
6.3%
A 2342614
6.3%
N 2340767
6.3%
l 2340767
6.3%
Other values (15) 4796177
12.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 37110123
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 5625388
15.2%
p 4685228
12.6%
4683381
12.6%
s 3256189
8.8%
e 2350753
6.3%
t 2346245
6.3%
D 2342614
6.3%
A 2342614
6.3%
N 2340767
6.3%
l 2340767
6.3%
Other values (15) 4796177
12.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 37110123
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 5625388
15.2%
p 4685228
12.6%
4683381
12.6%
s 3256189
8.8%
e 2350753
6.3%
t 2346245
6.3%
D 2342614
6.3%
A 2342614
6.3%
N 2340767
6.3%
l 2340767
6.3%
Other values (15) 4796177
12.9%

BODILY_INJURY
Categorical

IMBALANCE  MISSING 

Distinct14
Distinct (%)< 0.1%
Missing2525459
Missing (%)47.2%
Memory size343.8 MiB
Does Not Apply
2371373 
Back
 
76604
Neck
 
73655
Knee-Lower Leg Foot
 
69953
Head
 
63462
Other values (9)
 
173753

Length

Max length20
Median length14
Mean length13.311595
Min length3

Characters and Unicode

Total characters37655839
Distinct characters36
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDoes Not Apply
2nd rowDoes Not Apply
3rd rowBack
4th rowShoulder - Upper Arm
5th rowDoes Not Apply

Common Values

ValueCountFrequency (%)
Does Not Apply 2371373
44.3%
Back 76604
 
1.4%
Neck 73655
 
1.4%
Knee-Lower Leg Foot 69953
 
1.3%
Head 63462
 
1.2%
Entire Body 37106
 
0.7%
Elbow-Lower-Arm-Hand 31223
 
0.6%
Shoulder - Upper Arm 31185
 
0.6%
Unknown 19935
 
0.4%
Chest 16808
 
0.3%
Other values (4) 37496
 
0.7%
(Missing) 2525459
47.2%

Length

2024-05-06T23:35:24.905004image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
does 2371373
30.1%
apply 2371373
30.1%
not 2371373
30.1%
leg 86243
 
1.1%
back 76604
 
1.0%
neck 73655
 
0.9%
knee-lower 69953
 
0.9%
foot 69953
 
0.9%
head 63462
 
0.8%
39268
 
0.5%
Other values (13) 281312
 
3.6%

Most occurring characters

ValueCountFrequency (%)
o 5111360
13.6%
5045769
13.4%
p 4853986
12.9%
e 2997678
8.0%
t 2495240
6.6%
N 2445028
6.5%
A 2441864
6.5%
l 2441864
6.5%
y 2409354
6.4%
s 2396264
6.4%
Other values (26) 5017432
13.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 37655839
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
o 5111360
13.6%
5045769
13.4%
p 4853986
12.9%
e 2997678
8.0%
t 2495240
6.6%
N 2445028
6.5%
A 2441864
6.5%
l 2441864
6.5%
y 2409354
6.4%
s 2396264
6.4%
Other values (26) 5017432
13.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 37655839
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
o 5111360
13.6%
5045769
13.4%
p 4853986
12.9%
e 2997678
8.0%
t 2495240
6.6%
N 2445028
6.5%
A 2441864
6.5%
l 2441864
6.5%
y 2409354
6.4%
s 2396264
6.4%
Other values (26) 5017432
13.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 37655839
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
o 5111360
13.6%
5045769
13.4%
p 4853986
12.9%
e 2997678
8.0%
t 2495240
6.6%
N 2445028
6.5%
A 2441864
6.5%
l 2441864
6.5%
y 2409354
6.4%
s 2396264
6.4%
Other values (26) 5017432
13.3%

POSITION_IN_VEHICLE
Categorical

IMBALANCE  MISSING 

Distinct11
Distinct (%)< 0.1%
Missing2607042
Missing (%)48.7%
Memory size372.8 MiB
Driver
1919815 
Front passenger, if two or more persons, including the driver, are in the front seat
337021 
Right rear passenger or motorcycle sidecar passenger
 
137225
Left rear passenger, or rear passenger on a bicycle, motorcycle, snowmobile
 
128534
Any person in the rear of a station wagon, pick-up truck, all passengers on a bus, etc
 
74244
Other values (6)
 
150378

Length

Max length86
Median length6
Mean length24.571
Min length6

Characters and Unicode

Total characters67501868
Distinct characters39
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowFront passenger, if two or more persons, including the driver, are in the front seat
2nd rowRight rear passenger or motorcycle sidecar passenger
3rd rowDriver
4th rowDriver
5th rowDriver

Common Values

ValueCountFrequency (%)
Driver 1919815
35.9%
Front passenger, if two or more persons, including the driver, are in the front seat 337021
 
6.3%
Right rear passenger or motorcycle sidecar passenger 137225
 
2.6%
Left rear passenger, or rear passenger on a bicycle, motorcycle, snowmobile 128534
 
2.4%
Any person in the rear of a station wagon, pick-up truck, all passengers on a bus, etc 74244
 
1.4%
Unknown 63889
 
1.2%
Middle rear seat, or passenger lying across a seat 41514
 
0.8%
Middle front seat, or passenger lying across a seat 33717
 
0.6%
Riding/Hanging on Outside 7114
 
0.1%
Does Not Apply 3246
 
0.1%
(Missing) 2607042
48.7%

Length

2024-05-06T23:35:24.956013image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
driver 2256836
19.8%
passenger 943770
 
8.3%
the 748286
 
6.6%
front 707759
 
6.2%
or 678011
 
6.0%
rear 510051
 
4.5%
seat 487483
 
4.3%
in 411265
 
3.6%
a 352253
 
3.1%
if 337919
 
3.0%
Other values (38) 3958224
34.7%

Most occurring characters

ValueCountFrequency (%)
r 9578018
14.2%
8644640
12.8%
e 8077800
12.0%
i 4538986
 
6.7%
n 4076233
 
6.0%
s 3926498
 
5.8%
o 3843287
 
5.7%
a 3150716
 
4.7%
t 3121199
 
4.6%
v 2256836
 
3.3%
Other values (29) 16287655
24.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 67501868
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 9578018
14.2%
8644640
12.8%
e 8077800
12.0%
i 4538986
 
6.7%
n 4076233
 
6.0%
s 3926498
 
5.8%
o 3843287
 
5.7%
a 3150716
 
4.7%
t 3121199
 
4.6%
v 2256836
 
3.3%
Other values (29) 16287655
24.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 67501868
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 9578018
14.2%
8644640
12.8%
e 8077800
12.0%
i 4538986
 
6.7%
n 4076233
 
6.0%
s 3926498
 
5.8%
o 3843287
 
5.7%
a 3150716
 
4.7%
t 3121199
 
4.6%
v 2256836
 
3.3%
Other values (29) 16287655
24.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 67501868
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 9578018
14.2%
8644640
12.8%
e 8077800
12.0%
i 4538986
 
6.7%
n 4076233
 
6.0%
s 3926498
 
5.8%
o 3843287
 
5.7%
a 3150716
 
4.7%
t 3121199
 
4.6%
v 2256836
 
3.3%
Other values (29) 16287655
24.1%

SAFETY_EQUIPMENT
Categorical

IMBALANCE  MISSING 

Distinct17
Distinct (%)< 0.1%
Missing2779911
Missing (%)51.9%
Memory size346.1 MiB
Lap Belt & Harness
1648825 
Unknown
429584 
Lap Belt
361354 
Child Restraint Only
 
44909
Air Bag Deployed/Lap Belt/Harness
 
18962
Other values (12)
 
70714

Length

Max length40
Median length18
Mean length14.874828
Min length1

Characters and Unicode

Total characters38292984
Distinct characters37
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowLap Belt & Harness
2nd rowLap Belt
3rd rowLap Belt & Harness
4th rowLap Belt & Harness
5th rowLap Belt & Harness

Common Values

ValueCountFrequency (%)
Lap Belt & Harness 1648825
30.8%
Unknown 429584
 
8.0%
Lap Belt 361354
 
6.7%
Child Restraint Only 44909
 
0.8%
Air Bag Deployed/Lap Belt/Harness 18962
 
0.4%
Other 14920
 
0.3%
Helmet (Motorcycle Only) 13070
 
0.2%
Harness 12181
 
0.2%
Helmet Only (In-Line Skater/Bicyclist) 9984
 
0.2%
- 7185
 
0.1%
Other values (7) 13374
 
0.2%
(Missing) 2779911
51.9%

Length

2024-05-06T23:35:25.009683image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
belt 2013160
24.8%
lap 2010181
24.8%
harness 1661006
20.5%
1656010
20.4%
unknown 429584
 
5.3%
only 68581
 
0.8%
restraint 45449
 
0.6%
child 44909
 
0.6%
air 28751
 
0.4%
bag 28751
 
0.4%
Other values (12) 129476
 
1.6%

Most occurring characters

ValueCountFrequency (%)
5541510
14.5%
e 3925711
10.3%
a 3799940
9.9%
s 3419574
8.9%
n 3109886
 
8.1%
l 2227561
 
5.8%
t 2207669
 
5.8%
B 2074442
 
5.4%
p 2061953
 
5.4%
L 2045691
 
5.3%
Other values (27) 7879047
20.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 38292984
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
5541510
14.5%
e 3925711
10.3%
a 3799940
9.9%
s 3419574
8.9%
n 3109886
 
8.1%
l 2227561
 
5.8%
t 2207669
 
5.8%
B 2074442
 
5.4%
p 2061953
 
5.4%
L 2045691
 
5.3%
Other values (27) 7879047
20.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 38292984
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
5541510
14.5%
e 3925711
10.3%
a 3799940
9.9%
s 3419574
8.9%
n 3109886
 
8.1%
l 2227561
 
5.8%
t 2207669
 
5.8%
B 2074442
 
5.4%
p 2061953
 
5.4%
L 2045691
 
5.3%
Other values (27) 7879047
20.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 38292984
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
5541510
14.5%
e 3925711
10.3%
a 3799940
9.9%
s 3419574
8.9%
n 3109886
 
8.1%
l 2227561
 
5.8%
t 2207669
 
5.8%
B 2074442
 
5.4%
p 2061953
 
5.4%
L 2045691
 
5.3%
Other values (27) 7879047
20.6%

PED_LOCATION
Categorical

MISSING 

Distinct4
Distinct (%)< 0.1%
Missing5267506
Missing (%)98.4%
Memory size330.5 MiB
Pedestrian/Bicyclist/Other Pedestrian at Intersection
52803 
Pedestrian/Bicyclist/Other Pedestrian Not at Intersection
28060 
Does Not Apply
 
3426
Unknown
 
2464

Length

Max length57
Median length53
Mean length51.447108
Min length7

Characters and Unicode

Total characters4463191
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowPedestrian/Bicyclist/Other Pedestrian at Intersection
2nd rowPedestrian/Bicyclist/Other Pedestrian at Intersection
3rd rowPedestrian/Bicyclist/Other Pedestrian Not at Intersection
4th rowPedestrian/Bicyclist/Other Pedestrian at Intersection
5th rowPedestrian/Bicyclist/Other Pedestrian at Intersection

Common Values

ValueCountFrequency (%)
Pedestrian/Bicyclist/Other Pedestrian at Intersection 52803
 
1.0%
Pedestrian/Bicyclist/Other Pedestrian Not at Intersection 28060
 
0.5%
Does Not Apply 3426
 
0.1%
Unknown 2464
 
< 0.1%
(Missing) 5267506
98.4%

Length

2024-05-06T23:35:25.058165image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:35:25.097675image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
pedestrian/bicyclist/other 80863
22.2%
pedestrian 80863
22.2%
at 80863
22.2%
intersection 80863
22.2%
not 31486
 
8.6%
does 3426
 
0.9%
apply 3426
 
0.9%
unknown 2464
 
0.7%

Most occurring characters

ValueCountFrequency (%)
t 597527
13.4%
e 569467
12.8%
i 404315
9.1%
n 330844
 
7.4%
s 326878
 
7.3%
r 323452
 
7.2%
277501
 
6.2%
a 242589
 
5.4%
c 242589
 
5.4%
P 161726
 
3.6%
Other values (16) 986303
22.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4463191
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
t 597527
13.4%
e 569467
12.8%
i 404315
9.1%
n 330844
 
7.4%
s 326878
 
7.3%
r 323452
 
7.2%
277501
 
6.2%
a 242589
 
5.4%
c 242589
 
5.4%
P 161726
 
3.6%
Other values (16) 986303
22.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4463191
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
t 597527
13.4%
e 569467
12.8%
i 404315
9.1%
n 330844
 
7.4%
s 326878
 
7.3%
r 323452
 
7.2%
277501
 
6.2%
a 242589
 
5.4%
c 242589
 
5.4%
P 161726
 
3.6%
Other values (16) 986303
22.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4463191
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
t 597527
13.4%
e 569467
12.8%
i 404315
9.1%
n 330844
 
7.4%
s 326878
 
7.3%
r 323452
 
7.2%
277501
 
6.2%
a 242589
 
5.4%
c 242589
 
5.4%
P 161726
 
3.6%
Other values (16) 986303
22.1%

PED_ACTION
Categorical

MISSING 

Distinct16
Distinct (%)< 0.1%
Missing5267607
Missing (%)98.4%
Memory size328.2 MiB
Crossing With Signal
32525 
Crossing, No Signal, or Crosswalk
14729 
Crossing, No Signal, Marked Crosswalk
7424 
Other Actions in Roadway
6696 
Crossing Against Signal
6058 
Other values (11)
19220 

Length

Max length47
Median length44
Mean length24.471068
Min length7

Characters and Unicode

Total characters2120467
Distinct characters41
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCrossing With Signal
2nd rowCrossing With Signal
3rd rowCrossing, No Signal, or Crosswalk
4th rowCrossing With Signal
5th rowCrossing With Signal

Common Values

ValueCountFrequency (%)
Crossing With Signal 32525
 
0.6%
Crossing, No Signal, or Crosswalk 14729
 
0.3%
Crossing, No Signal, Marked Crosswalk 7424
 
0.1%
Other Actions in Roadway 6696
 
0.1%
Crossing Against Signal 6058
 
0.1%
Unknown 4142
 
0.1%
Not in Roadway 4033
 
0.1%
Does Not Apply 3852
 
0.1%
Emerging from in Front of/Behind Parked Vehicle 2765
 
0.1%
Working in Roadway 1321
 
< 0.1%
Other values (6) 3107
 
0.1%
(Missing) 5267607
98.4%

Length

2024-05-06T23:35:25.152818image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
crossing 60736
18.9%
signal 60736
18.9%
with 33390
10.4%
crosswalk 22153
 
6.9%
no 22153
 
6.9%
in 15291
 
4.8%
or 14729
 
4.6%
roadway 12526
 
3.9%
other 7887
 
2.5%
not 7885
 
2.5%
Other values (29) 63454
19.8%

Most occurring characters

ValueCountFrequency (%)
234288
11.0%
i 201925
 
9.5%
s 184150
 
8.7%
n 180071
 
8.5%
o 168884
 
8.0%
g 141442
 
6.7%
a 129809
 
6.1%
r 126969
 
6.0%
l 94735
 
4.5%
C 83108
 
3.9%
Other values (31) 575086
27.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2120467
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
234288
11.0%
i 201925
 
9.5%
s 184150
 
8.7%
n 180071
 
8.5%
o 168884
 
8.0%
g 141442
 
6.7%
a 129809
 
6.1%
r 126969
 
6.0%
l 94735
 
4.5%
C 83108
 
3.9%
Other values (31) 575086
27.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2120467
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
234288
11.0%
i 201925
 
9.5%
s 184150
 
8.7%
n 180071
 
8.5%
o 168884
 
8.0%
g 141442
 
6.7%
a 129809
 
6.1%
r 126969
 
6.0%
l 94735
 
4.5%
C 83108
 
3.9%
Other values (31) 575086
27.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2120467
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
234288
11.0%
i 201925
 
9.5%
s 184150
 
8.7%
n 180071
 
8.5%
o 168884
 
8.0%
g 141442
 
6.7%
a 129809
 
6.1%
r 126969
 
6.0%
l 94735
 
4.5%
C 83108
 
3.9%
Other values (31) 575086
27.1%

COMPLAINT
Categorical

IMBALANCE  MISSING 

Distinct21
Distinct (%)< 0.1%
Missing2525452
Missing (%)47.2%
Memory size348.2 MiB
Does Not Apply
2372169 
Complaint of Pain or Nausea
 
201250
Complaint of Pain
 
88497
None Visible
 
46571
Minor Bleeding
 
24567
Other values (16)
 
95753

Length

Max length34
Median length14
Mean length14.921692
Min length7

Characters and Unicode

Total characters42210588
Distinct characters39
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowDoes Not Apply
2nd rowDoes Not Apply
3rd rowComplaint of Pain or Nausea
4th rowNone Visible
5th rowDoes Not Apply

Common Values

ValueCountFrequency (%)
Does Not Apply 2372169
44.3%
Complaint of Pain or Nausea 201250
 
3.8%
Complaint of Pain 88497
 
1.7%
None Visible 46571
 
0.9%
Minor Bleeding 24567
 
0.5%
Contusion - Bruise 19260
 
0.4%
Unknown 19049
 
0.4%
Whiplash 18560
 
0.3%
Abrasion 14033
 
0.3%
Internal 7395
 
0.1%
Other values (11) 17456
 
0.3%
(Missing) 2525452
47.2%

Length

2024-05-06T23:35:25.203134image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
does 2372169
27.3%
not 2372169
27.3%
apply 2372169
27.3%
complaint 289747
 
3.3%
of 289747
 
3.3%
pain 289747
 
3.3%
or 201250
 
2.3%
nausea 201250
 
2.3%
none 46571
 
0.5%
visible 46571
 
0.5%
Other values (21) 216030
 
2.5%

Most occurring characters

ValueCountFrequency (%)
5868613
13.9%
o 5697405
13.5%
p 5052798
12.0%
e 2775795
 
6.6%
l 2769320
 
6.6%
t 2716250
 
6.4%
s 2714180
 
6.4%
N 2619990
 
6.2%
A 2386355
 
5.7%
D 2385200
 
5.7%
Other values (29) 7224682
17.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 42210588
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
5868613
13.9%
o 5697405
13.5%
p 5052798
12.0%
e 2775795
 
6.6%
l 2769320
 
6.6%
t 2716250
 
6.4%
s 2714180
 
6.4%
N 2619990
 
6.2%
A 2386355
 
5.7%
D 2385200
 
5.7%
Other values (29) 7224682
17.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 42210588
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
5868613
13.9%
o 5697405
13.5%
p 5052798
12.0%
e 2775795
 
6.6%
l 2769320
 
6.6%
t 2716250
 
6.4%
s 2714180
 
6.4%
N 2619990
 
6.2%
A 2386355
 
5.7%
D 2385200
 
5.7%
Other values (29) 7224682
17.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 42210588
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
5868613
13.9%
o 5697405
13.5%
p 5052798
12.0%
e 2775795
 
6.6%
l 2769320
 
6.6%
t 2716250
 
6.4%
s 2714180
 
6.4%
N 2619990
 
6.2%
A 2386355
 
5.7%
D 2385200
 
5.7%
Other values (29) 7224682
17.1%

PED_ROLE
Categorical

MISSING 

Distinct10
Distinct (%)< 0.1%
Missing194889
Missing (%)3.6%
Memory size333.0 MiB
Registrant
2220389 
Driver
1964972 
Passenger
776908 
Pedestrian
 
85171
Witness
 
72450
Other values (5)
 
39480

Length

Max length15
Median length14
Mean length8.2661449
Min length5

Characters and Unicode

Total characters42648100
Distinct characters30
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRegistrant
2nd rowPassenger
3rd rowRegistrant
4th rowNotified Person
5th rowPassenger

Common Values

ValueCountFrequency (%)
Registrant 2220389
41.5%
Driver 1964972
36.7%
Passenger 776908
 
14.5%
Pedestrian 85171
 
1.6%
Witness 72450
 
1.4%
Owner 26682
 
0.5%
Notified Person 8344
 
0.2%
Policy Holder 2415
 
< 0.1%
Other 1685
 
< 0.1%
In-Line Skater 354
 
< 0.1%
(Missing) 194889
 
3.6%

Length

2024-05-06T23:35:25.247043image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:35:25.290498image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
registrant 2220389
42.9%
driver 1964972
38.0%
passenger 776908
 
15.0%
pedestrian 85171
 
1.6%
witness 72450
 
1.4%
owner 26682
 
0.5%
notified 8344
 
0.2%
person 8344
 
0.2%
policy 2415
 
< 0.1%
holder 2415
 
< 0.1%
Other values (3) 2393
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
r 7051892
16.5%
e 6030147
14.1%
t 4608782
10.8%
i 4362439
10.2%
s 4012620
9.4%
n 3190652
7.5%
a 3082822
7.2%
g 2997297
7.0%
R 2220389
 
5.2%
D 1964972
 
4.6%
Other values (20) 3126088
7.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 42648100
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
r 7051892
16.5%
e 6030147
14.1%
t 4608782
10.8%
i 4362439
10.2%
s 4012620
9.4%
n 3190652
7.5%
a 3082822
7.2%
g 2997297
7.0%
R 2220389
 
5.2%
D 1964972
 
4.6%
Other values (20) 3126088
7.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 42648100
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
r 7051892
16.5%
e 6030147
14.1%
t 4608782
10.8%
i 4362439
10.2%
s 4012620
9.4%
n 3190652
7.5%
a 3082822
7.2%
g 2997297
7.0%
R 2220389
 
5.2%
D 1964972
 
4.6%
Other values (20) 3126088
7.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 42648100
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
r 7051892
16.5%
e 6030147
14.1%
t 4608782
10.8%
i 4362439
10.2%
s 4012620
9.4%
n 3190652
7.5%
a 3082822
7.2%
g 2997297
7.0%
R 2220389
 
5.2%
D 1964972
 
4.6%
Other values (20) 3126088
7.3%

CONTRIBUTING_FACTOR_1
Text

MISSING 

Distinct53
Distinct (%)0.1%
Missing5268834
Missing (%)98.4%
Memory size167.0 MiB
2024-05-06T23:35:25.356776image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length53
Median length11
Mean length19.60295
Min length5

Characters and Unicode

Total characters1674582
Distinct characters52
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 59729
45.2%
pedestrian/bicyclist/other 13527
 
10.2%
pedestrian 13527
 
10.2%
error/confusion 13527
 
10.2%
driver 3177
 
2.4%
inattention/distraction 3099
 
2.3%
to 2321
 
1.8%
failure 2263
 
1.7%
right-of-way 2241
 
1.7%
yield 2241
 
1.7%
Other values (90) 16503
 
12.5%
2024-05-06T23:35:25.473640image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 215196
12.9%
e 212540
12.7%
n 134022
 
8.0%
s 122238
 
7.3%
r 103575
 
6.2%
c 95265
 
5.7%
d 94213
 
5.6%
t 80850
 
4.8%
f 78813
 
4.7%
p 60884
 
3.6%
Other values (42) 476986
28.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1674582
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 215196
12.9%
e 212540
12.7%
n 134022
 
8.0%
s 122238
 
7.3%
r 103575
 
6.2%
c 95265
 
5.7%
d 94213
 
5.6%
t 80850
 
4.8%
f 78813
 
4.7%
p 60884
 
3.6%
Other values (42) 476986
28.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1674582
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 215196
12.9%
e 212540
12.7%
n 134022
 
8.0%
s 122238
 
7.3%
r 103575
 
6.2%
c 95265
 
5.7%
d 94213
 
5.6%
t 80850
 
4.8%
f 78813
 
4.7%
p 60884
 
3.6%
Other values (42) 476986
28.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1674582
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 215196
12.9%
e 212540
12.7%
n 134022
 
8.0%
s 122238
 
7.3%
r 103575
 
6.2%
c 95265
 
5.7%
d 94213
 
5.6%
t 80850
 
4.8%
f 78813
 
4.7%
p 60884
 
3.6%
Other values (42) 476986
28.5%

CONTRIBUTING_FACTOR_2
Text

MISSING 

Distinct51
Distinct (%)0.1%
Missing5268945
Missing (%)98.4%
Memory size166.6 MiB
2024-05-06T23:35:25.566306image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

Max length53
Median length11
Mean length13.841515
Min length5

Characters and Unicode

Total characters1180875
Distinct characters52
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique6 ?
Unique (%)< 0.1%

Sample

1st rowUnspecified
2nd rowUnspecified
3rd rowUnspecified
4th rowUnspecified
5th rowUnspecified
ValueCountFrequency (%)
unspecified 75142
72.6%
pedestrian/bicyclist/other 3738
 
3.6%
pedestrian 3738
 
3.6%
error/confusion 3738
 
3.6%
driver 1343
 
1.3%
inattention/distraction 1221
 
1.2%
to 1213
 
1.2%
failure 1181
 
1.1%
yield 1158
 
1.1%
right-of-way 1158
 
1.1%
Other values (84) 9925
 
9.6%
2024-05-06T23:35:25.709521image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 183294
15.5%
e 182730
15.5%
n 99331
8.4%
s 94694
8.0%
d 87313
7.4%
c 86861
7.4%
f 82347
7.0%
p 76084
6.4%
U 75601
6.4%
r 34647
 
2.9%
Other values (42) 177973
15.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1180875
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
i 183294
15.5%
e 182730
15.5%
n 99331
8.4%
s 94694
8.0%
d 87313
7.4%
c 86861
7.4%
f 82347
7.0%
p 76084
6.4%
U 75601
6.4%
r 34647
 
2.9%
Other values (42) 177973
15.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1180875
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
i 183294
15.5%
e 182730
15.5%
n 99331
8.4%
s 94694
8.0%
d 87313
7.4%
c 86861
7.4%
f 82347
7.0%
p 76084
6.4%
U 75601
6.4%
r 34647
 
2.9%
Other values (42) 177973
15.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1180875
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
i 183294
15.5%
e 182730
15.5%
n 99331
8.4%
s 94694
8.0%
d 87313
7.4%
c 86861
7.4%
f 82347
7.0%
p 76084
6.4%
U 75601
6.4%
r 34647
 
2.9%
Other values (42) 177973
15.1%

PERSON_SEX
Categorical

MISSING 

Distinct3
Distinct (%)< 0.1%
Missing600519
Missing (%)11.2%
Memory size299.6 MiB
M
2878999 
F
1447674 
U
427067 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters4753740
Distinct characters3
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowU
2nd rowF
3rd rowM
4th rowF
5th rowM

Common Values

ValueCountFrequency (%)
M 2878999
53.8%
F 1447674
27.0%
U 427067
 
8.0%
(Missing) 600519
 
11.2%

Length

2024-05-06T23:35:25.768584image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2024-05-06T23:35:25.802609image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
m 2878999
60.6%
f 1447674
30.5%
u 427067
 
9.0%

Most occurring characters

ValueCountFrequency (%)
M 2878999
60.6%
F 1447674
30.5%
U 427067
 
9.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 4753740
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
M 2878999
60.6%
F 1447674
30.5%
U 427067
 
9.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 4753740
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
M 2878999
60.6%
F 1447674
30.5%
U 427067
 
9.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 4753740
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
M 2878999
60.6%
F 1447674
30.5%
U 427067
 
9.0%

Interactions

2024-05-06T23:34:51.196670image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:47.442227image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:48.769941image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:50.002550image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:51.506341image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:47.792401image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:49.048739image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:50.321029image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:51.790215image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:48.148555image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:49.370200image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:50.608506image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:52.042836image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:48.452501image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:49.683025image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
2024-05-06T23:34:50.888205image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Missing values

2024-05-06T23:34:53.651062image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.
2024-05-06T23:34:58.836198image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

UNIQUE_IDCOLLISION_IDCRASH_DATECRASH_TIMEPERSON_IDPERSON_TYPEPERSON_INJURYVEHICLE_IDPERSON_AGEEJECTIONEMOTIONAL_STATUSBODILY_INJURYPOSITION_IN_VEHICLESAFETY_EQUIPMENTPED_LOCATIONPED_ACTIONCOMPLAINTPED_ROLECONTRIBUTING_FACTOR_1CONTRIBUTING_FACTOR_2PERSON_SEX
010249006422955410/26/20199:4331aa2bc0-f545-444f-8cdb-f1cb5cf00b89OccupantUnspecified19141108.0NaNNaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNU
110255054423058710/25/201915:154629e500-a73e-48dc-b8fb-53124d124b80OccupantUnspecified19144075.033.0Not EjectedDoes Not ApplyDoes Not ApplyFront passenger, if two or more persons, including the driver, are in the front seatLap Belt & HarnessNaNNaNDoes Not ApplyPassengerNaNNaNF
210253177423055010/26/201917:55ae48c136-1383-45db-83f4-2a5eecfb7cffOccupantUnspecified19143133.055.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNM
36650180356552711/21/201613:052782525OccupantUnspecifiedNaNNaNNaNNaNNaNNaNNaNNaNNaNNaNNotified PersonNaNNaNNaN
410255516423116810/25/201911:16e038e18f-40fb-4471-99cf-345eae36e064OccupantUnspecified19144329.07.0Not EjectedDoes Not ApplyDoes Not ApplyRight rear passenger or motorcycle sidecar passengerLap BeltNaNNaNDoes Not ApplyPassengerNaNNaNF
510253606423074310/24/201919:1584bcb3a7-d201-4c61-9e30-fe29268c1074OccupantInjured19143343.027.0Not EjectedConsciousBackDriverLap Belt & HarnessNaNNaNComplaint of Pain or NauseaDriverNaNNaNM
610251336423004710/26/201916:4521064a07-a945-49d0-af97-5446801b20ceOccupantUnspecified19142198.041.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNF
710248708422954710/26/20191:15a8904763-2870-42f3-865c-b53d8e5156e2PedestrianInjuredNaN24.0NaNConsciousShoulder - Upper ArmNaNNaNPedestrian/Bicyclist/Other Pedestrian at IntersectionCrossing With SignalNone VisiblePedestrianUnspecifiedUnspecifiedF
810250179422980810/26/201913:04c3fc715e-203f-462d-9e8b-6a41fc378703OccupantUnspecified19141630.036.0Not EjectedDoes Not ApplyDoes Not ApplyDriverLap Belt & HarnessNaNNaNDoes Not ApplyDriverNaNNaNM
910253792423091510/24/20198:20793ac6c6-cbc7-4ab3-ab95-09f9312f1123OccupantUnspecified19143438.0NaNNaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNU
UNIQUE_IDCOLLISION_IDCRASH_DATECRASH_TIMEPERSON_IDPERSON_TYPEPERSON_INJURYVEHICLE_IDPERSON_AGEEJECTIONEMOTIONAL_STATUSBODILY_INJURYPOSITION_IN_VEHICLESAFETY_EQUIPMENTPED_LOCATIONPED_ACTIONCOMPLAINTPED_ROLECONTRIBUTING_FACTOR_1CONTRIBUTING_FACTOR_2PERSON_SEX
535424912963943472129105/02/20242:1047f20bc8-b903-4ffa-9840-7fc87d12066fOccupantUnspecified20642545.064.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNM
535425012965756472173304/22/20246:42bbcc9078-12b8-4fe5-b0cc-cfe165ec8c1aOccupantUnspecified20643561.038.0Not EjectedDoes Not ApplyDoes Not ApplyDriverUnknownNaNNaNDoes Not ApplyDriverNaNNaNM
535425112968330472201705/03/202417:20c48bad04-5cd4-4570-920d-abf2c71c270eOccupantUnspecified20645029.064.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNF
535425212968114472223605/03/202410:249c045e6c-7c2c-4198-a025-d6319532f6f7OccupantUnspecified20644894.0NaNNaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNM
535425312966554472188805/02/202410:177a6e7e8a-5600-43fa-bd94-9b31f659b008OccupantUnspecified20643991.030.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNF
535425412967198472191205/03/202410:0423a9a06c-39bd-4244-9edc-e6b580e6a40fOccupantUnspecified20644351.037.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNM
535425512967750472204305/03/202415:51f5a87e1f-aae8-4985-b127-70def1af4e2aOccupantUnspecified20644670.039.0NaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNM
535425612967293472204905/03/202422:380ad1dc57-3f7a-46ab-8f16-2068d0e67be5OccupantUnspecified20644397.015.0Not EjectedDoes Not ApplyDoes Not ApplyDriverNaNNaNNaNDoes Not ApplyDriverNaNNaNM
535425712965981472164104/29/202411:30d4e5d2f5-b63b-40d6-992c-2ef10d42deb8OccupantUnspecified20643681.0NaNNaNNaNNaNNaNNaNNaNNaNNaNRegistrantNaNNaNNaN
535425812965829472161104/30/202418:2282494184-4282-4154-9623-989a225d9074OccupantInjured20643594.025.0Not EjectedConsciousShoulder - Upper ArmFront passenger, if two or more persons, including the driver, are in the front seatLap BeltNaNNaNComplaint of Pain or NauseaPassengerNaNNaNF